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Abstract 

Many  parallel  processing  networks  can  be  viewed  as  graphs  called  k- ary  n-cubes, 
whose  special  cases  include  rings,  hypercubes  and  toruses.  In  this  paper,  combinatorial 
properties  of  jfe-ary  n-cubes  are  explored.  In  particular,  the  problem  of  characterizing  the 
subgraph  of  a  given  number  of  nodes  with  the  maximum  edge  count  is  studied.  These 
theoretical  results  are  then  used  to  compute  a  lower  bounding  function  in  branch-and- 
bound  partitioning  algorithms  and  to  establish  the  optimality  of  some  irregular  partitions. 
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1  Introduction 


In  a  k-ary  n-cube  Gk,n ,  each  node  is  identified  by  an  n-bit  bas e-k  address  6„_i  . .  .6t . .  .60, 
and  for  every  dimension  i  =  0, 1, . . . ,  n  —  1,  it  is  connected  by  edges  to  nodes  with  addresses 
. .  .b{  ±  l(mod  k) . .  .bo- 

We  can  also  define  Gk,n  recursively.  First,  we  define  a  ring  of  k  nodes  0, 1, . . k  —  1  to 
be  a  graph  with  edges  between  i  and  i  +  l(mod  k)  for  i  =  0, 1, . . . ,  k  —  1.  When  k  =  1,  a 
ring  is  a  point.  When  k  =  2,  a  ring  is  two  nodes  sharing  an  edge.  When  k  >  3,  a  ring  is  a 
conventional  ring.  The  recursive  definition  of  Gk,n  is  as  follows. 

•  Gk,\  is  a.  ring  of  k  nodes.  Without  loss  of  generality,  we  place  the  k  nodes  on  a  line, 
and  call  the  leftmost  node  the  0th  position  node  and  the  rightmost  node  the  ( k  —  l)s4 
position  node. 

•  Gk,n  contains  k  composite  subcubes  of  type  Gk,n-i  placed  from  left  to  right.  For  each  po¬ 
sition  i  =  0, . . . ,  A:”-1  —  1,  edges  between  composite  subcubes  are  defined  by  connecting 
all  k  ith  position  nodes  in  a  ring. 

Further,  Gk,n  can  also  be  viewed  as  an  n-dimensional  (n-D)  torus,  which  is  a  k  x  •  •  •  x  k 

n 

cube  of  grids  with  wrap-around  edges. 

The  second  and  the  third  definitions  of  Gk,n  provide  two  ways  of  drawing  Gk,n-  See  Figure 
1  for  an  example. 

Table  1  shows  special  cases  of  Gk,n ■  The  first  column  contains  the  values  of  k ,  and  the 
first  row  contains  the  values  of  n.  We  notice  that  the  class  Gk,n  contains  many  topologies 
important  to  parallel  processing,  such  as  rings,  hypercubes  and  toruses;  hence  a  thorough 
study  of  Gk,n  is  worthwhile. 

The  following  combinatorial  properties  of  Gk,n  are  easy  to  verify  except  perhaps  the  last 
one,  for  which  we  provide  its  proof  in  Appendix. 

Property  1.1  (?*,„  has  kn  nodes. 

Property  1.2  Gk,n  contains  k  composite  subcubes  of  type  Gk,n- 1,  and  the  number  of  edges 
with  endpoints  in  different  composite  subcubes  is  kn~ 1  for  k  =  2  and  kn  for  k  >  3. 

Property  1.3  Gk,n  is  a  regular  graph,  meaning  that  each  node  has  the  same  degree.  The 
degree  of  each  node  is  n  for  k  =  2  and  2 n  for  k  >  3. 

Property  1.4  The  number  of  edges  in  Gk,n  is  nkn~l  for  k  =  2  and  nkn  for  k  >  3. 

Property  1.5  In  each  ith  composite  subcube  (0  <i  <k  —  1)  of  type  Gk,n- 1  in  Gk,n>  choose 
mi  nodes,  and  define  m  =  JfiZo  mi-  The  number  of  edges  with  endpoints  among  these  m 
nodes  but  in  different  composite  subcubes  is  no  larger  than  min{mo,mi}  for  k  =  2,  and  is  no 
larger  than  m  -  maxo<i<fc_i{mj}  +  mino<t<fc-i{m,}  for  k  >  3. 
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Figure  1:  A  3-ary  2-cube  G 3,2 
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Table  1:  Special  cases  of  Gk,n 

Properties  of  k- ary  n-cubes  related  to  VLSI  concerns  have  been  explored  by  Dally  [4],  One 
property  that  is  related  to  our  study  is  the  bisection-width  of  k- ary  72-cubes,  the  minimum 
number  of  edges  one  must  cut  when  partitioning  the  graph  into  two  subgraphs  with  the  equal 
numbers  of  nodes.  Our  work  considers  a  generalization  of  this  notion:  given  that  the  partition 
may  contain  P  subgraphs,  what  is  the  minimum  number  of  edges  between  the  subgraphs? 

The  problem  of  partitioning  graphs  for  parallel  processing  includes  rigorous  treatments 
in  [8,  11],  where  algorithms  are  developed  that  partition  graphs  with  guarantees  on  the  load- 
imbalance  and  number  of  edges  cut.  Our  work  is  similar  in  the  sense  of  its  rigor,  restricted 
to  k- ary  n-cubes  we  give  achievable  lower  bounds  on  partitioning  costs. 

We  have  previously  studied  properties  of  fc-ary  n-cubes  in  the  context  of  load  balancing 
[10].  Here  graph  nodes  typically  represent  computation  and  edges  represent  communication. 
For  any  subgraph,  define  an  internal  edge  to  be  one  with  two  endpoints  in  the  subgraph 
and  an  external  edge  to  be  one  with  one  endpoint  in  the  subgraph;  viewing  the  subgraph 
as  the  set  of  nodes  assigned  to  a  processor,  the  number  of  external  edges  is  a  measure 
of  the  communication  cost.  Allowing  nodes  and  edges  to  be  weighted  (reflecting  relative 
computation  and  communication  volumes,  respectively),  the  “load”  of  a  subgraph  is  taken 
to  be  the  sum  of  the  weights  of  its  nodes  and  its  external  edges.  If  Gk,n  is  partitioned  into 
P  subgraphs,  the  bottleneck  cost  of  the  partition  is  the  maximum  load  among  all  partition 
subgraphs  [2, 12],  The  bottleneck  cost  reflects  that  of  one  phase  of  a  data  parallel  computation 
where  computation  and  communication  are  not  overlapped,  and  a  global  synchronization 
occurs  at  the  end  of  the  phase.  The  communication  that  occurs  is  needed  for  the  subsequent 
phase;  there  are  no  data  dependencies  among  the  computations  performed  in  a  given  phase. 
In  a  previous  paper  [10]  we  showed  that  certain  equi-partitions  are  optimal  in  the  sense  of 
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minimizing  the  bottleneck  cost,  but  that,  surprisingly,  there  exist  cases  where  the  optimal 
partition  is  not  an  equi-partition.  These  results  are  based  on  a  lower  bound  on  a  processor’s 
communication  cost,  a  bound  that  is  achieved  for  selected  subgraph  sizes.  The  current  paper 
completes  that  work  by  identifying  an  achievable  bound  for  general  subgraph  sizes. 

The  problem  of  identifying  the  minimal  communication  cost  (assuming  unit  edge  weight) 
of  a  subgraph  of  size  m  is  the  same  as  maximizing  the  number  of  internal  edges  in  a  subgraph 
with  m  nodes,  since  each  node  in  Gk,n  has  the  same  degree.  That  is,  we  study  the  following 
combinatorial  problem. 

Consider  any  subgraph  Sm  of  m  <  kn  nodes  in  Gk,n-  Let  e(Sm)  be  the  number 
of  internal  edges  in  Sm .  Define 

ek(m,n)  =  max{e(5m)}. 
vsm 

For  any  m  =  1,2, . . .,  kn,  determine  ek(m,n),  the  maximum  number  of  internal 
edges  in  any  subgraph  Sm  in  a  k- ary  n-cube. 

We  will  say  that  a  subgraph  of  Gk,n  with  m  nodes  is  optimal  if  it  has  efc(m,  n )  internal  edges. 

The  case  k  -  1  is  trivial:  e\ (m,  n)  -  0  for  m  <  1"  =  1.  In  Sections  2,  3,  and  4,  we  will 
determine  e2(m,  n),  (m,n),  and  e4(m,  n),  respectively.  In  Section  5,  we  will  study  the  case 

k  >  5.  In  Section  6,  we  present  two  applications  of  the  results  developed;  one  uses  these 
results  in  the  context  of  branch-and-bound  algorithms  for  partitioning  k- ary  n-cubes  with 
generally  weighted  nodes  and  edges.  Finally,  we  summarize  our  contributions  in  Section  7. 
Appendix  contains  the  proofs  of  Property  1.5  and  lemmas  contributing  to  the  main  results. 

2  The  case  k  =  2 

To  determine  e2 (m,  n),  the  maximum  number  of  internal  edges  of  a  subgraph  with  m  nodes 
in  a  hypercube,  we  will  have  to  do  some  preliminary  work. 

Definition  2.1 

w(i )  denotes  the  sum  of  all  bits  in  the  base-2  (binary)  representation  of  i. 

W(i,j),  i  <  j,  denotes  the  sum  of  w(i), . . .  ,w(j). 

The  following  three  lemmas  concern  properties  of  function  W.  Their  proofs  can  be  found 
in  Appendix. 

Lemma  2.1  W(i:2i  -  1)  =  W(0,i  —  1)  +  i  for  i  >  1. 

Lemma  2.2  W(i  +  1, 2 i)  =  W(0,  i  —  1)  +  i  for  i  >  1. 

Lemma  2.3  W(j,j  +  i  -  1)  >  W(0,  i  -  1)  +  i  for  j  >  i  >  1. 

We  next  define  a  recursive  function  F  and  give  its  closed  form  in  terms  of  W . 
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Definition  2.2 
F(0)  =  F(l)  =  0; 

F(m)  =  F(  [f 1 )  +  F(  If  J )  +  Lf  J  for  m  >  2. 

Theorem  2.1  F(m)  =  M7(0,  m  -  1)  /or  m  >  1. 

Proof  We  induct  on  m.  When  m  =  1,  F(l)  =  U7(0,0)  =  0.  Assume  that  the  equation  holds 
for  <  m  —  1.  Now  consider  m. 

Case  1.  m  =  2?  for  some  ?  >  1. 

F(m)  =  F(f)  +  F(?)  +  ?’  (Definition  2.2) 

=  IU(0,  i  —  1)  +  W(0,  i  —  1)  +  i  (Inductive  hypothesis) 

=  W(0,  i  —  1)  +  W(t,  2i  —  1)  (Lemma  2.1) 

=  W(0,2?  -  1) 

=  W(0,  m  — 1). 

Case  2.  m  =  2i  +  1  for  some  i  >  1. 

F(m)  =  F(i  +  1)  +  F(i)  +  i  (Definition  2.2) 

=  W(0,  i)  +  H7(0,  i  —  1)  +  i  (Inductive  hypothesis) 

=  M7(0,  i)  +  W{i  +  1,2?)  (Lemma  2.2) 

=  W(0, 2?') 

=  H7(0,  to  —  1).  ■ 

Corollary  2.1  F(m)  >  F(mo)  +  F(m i)  +  min{mo, toj}  for  mo  +  mj  =  m. 

Proof  If  at  least  one  of  n?0  and  m\  is  0,  the  inequality  holds  trivially.  Now  assume  that 

Too  >  to,j  >  1. 

F(m)  =  W(0,  to  — 1)  (Theorem  2.1) 

=  W (0,  Too  —  1 )  T  W7( Too,  to  —  1 ) 

>  W7(0,too  —  1)  +  W7(0,toi  —  1)  +  mi  (Lemma  2.3) 

=  F(m0)  +  F(mi)  +  m\  (Theorem  2.1).  ■ 

COROLLARY  2.2  F(m)  =  ^mlog2m  if  m  =  2l  for  some  l. 

Proof  Use  Definition  2.2  and  inductive  proof  on  m.  ■ 

It  turns  out  that  F(m)  exactly  captures  the  quantity  of  interest. 

Theorem  2.2  e2 (m,  ra)  =  F(m)  for  m  <  2n. 
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Figure  2:  Subgraphs  of  G2,n  achieving  internal  edge  count  F(rn) 

Proof  Since  G2tTl  contains  two  composite  subcubes  of  type  G 2,n— 1  ■>  assume  that  m q  and  m\ 
nodes  are  chosen  in  the  0th  and  1st  composite  subcubes,  respectively.  By  Property  1.5, 

e2(0  ,n)  =  e2(l,n)  =  0; 

e2 (m,n)<  max  {e2(mo, n  -  1)  +  e2(mi, n  —  1)  +  minjrao, mi}}. 

First  we  prove  by  induction  on  m  that  e2 (m,n)  <  F(m).  When  m  —  0,1,  e2{m^n)  — 
F(m. )  =  0.  Assume  that  the  inequality  holds  for  <  m  -  1.  Now  consider  m. 

e2 (m,  n)  <  max  {€2(^0,  n  -  1)  +  e2(mi,  n  -  1)  +  min{m0,  mi}} 

V  mi=m 

<  max  {F(mo)  +  F(mi)  +  min{mo, mj}  (Inductive  hypothesis) 

V  Yf,  mi—m 

<  F{m)  (Corollary  2.1). 

Next  we  prove  that  there  is  a  subgraph  S ^  of  m  nodes  such  that  the  number  of  internal 
edges  in  5^  is  F(m).  Here  is  how  we  can  allocate  the  m  nodes  for  S^:  Allocate  [y]  nodes 
into  the  0th  composite  subcube  and  [yj  nodes  into  the  Is*  composite  subcube;  use  the  same 
method  recursively  to  allocate  the  nodes  in  each  composite  subcube.  It  is  obvious  that  the 
number  of  internal  edges  in  S ^  is  exactly  F(m).  I 

This  theorem  tells  us  about  the  structure  of  a  subgraph  with  exactly  F(m )  internal 
edges — it  is  possible  to  bisect  this  subgraph  “evenly”  with  exactly  |_yj  edges  between  the 
two  pieces,  which  are  themselves  optimal  with  respect  to  their  sizes.  Figure  2  illustrates 
optimal  subgraphs  of  G2,n  for  m  =  3,4, 5,6. 

3  The  case  k  =  3 

Similar  to  the  previous  section,  to  determine  es{m,n)  for  G^in  we  will  have  to  do  some 
preliminary  work. 

Definition  3.1 

z(i )  denotes  the  sum  of  all  bits  in  the  base-3  representation  of  i. 

Z(i,j).  i  <  j,  denotes  the  sum  of  z(i), . . .,  z(j). 
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The  following  five  lemmas  concern  the  properties  of  function  Z.  Their  proofs  can  be 
found  in  Appendix. 

Lemma  3.1  Z(0,3i  -  1)  =  3Z(0,i  -  1)  +  3*  for  i  >  1. 

Lemma  3.2  Z(0,3j)  =  Z( 0,  i)  +  2Z(0,  i—  1)  +  3?  for  i>  1. 

Lemma  3.3  Z(0,32Tl)  =  2Z(  0,  ?)  T  Z(  0, 2  —  1 )  T  3i  T  1  for  i  ^  1 . 

Lemma  3.4  Z(j,j  +  i  -  1)  >  Z( 0,  i  —  1)  +  i  for  j  >  i  >  1. 

Lemma  3.5  Z(j ,  j  T  i  i  T  22  —  1 )  ^  Z(0, 21  —  1 )  T  Z(  0, 2*2  —  1)  T  21  T  2  i  2  y*or  j  ^  21  ^  2*2  ^  1  • 

We  next  define  a  recursive  function  (?  and  give  its  closed  form  in  terms  of  Z. 

Definition  3.2 
G(0)  =  G(l)  =  0; 

G(m)  -  (m  mod  3)G([y])  +  (3  -  m  mod  3)G([yJ)  +  m  -  [y]  +  [yj  for  m  >  2. 
Theorem  3.1  G(m)  =  Z(0,m  -  1)  for  m  >  1. 

Proof  Similar  to  the  proof  of  Theorem  2.1.  In  the  inductive  step,  we  consider  three  cases: 
m  =  3i,  m  =  3?  +  1.  and  m  =  3?  +  2,  and  use  Lemmas  3.1,  3.2,  and  3.3  in  the  three  cases, 
respectively.  I 

Corollary  3.1  G(m)>  Gr(m0)+(?(mi)+(7(m2)  +  m-max{mo,mi,m2}  +  min{mo,mi,m2} 
for  mo  +  mi  +  m2  =  m. 

Proof  If  at  least  two  of  mo.  m\  and  m2  are  0,  the  inequality  holds  trivially.  If  only  one,  say 
m2,  is  0,  assuming  that  m0  >  mi  >  1,  the  derivation  is  almost  identical  to  the  same  case  in 
the  proof  of  Corollary  2.1  except  here  we  use  G  instead  of  F  and  Z  instead  of  W .  If  none  of 
mo,  mi  and  m2  is  0,  assuming  that  m0  >  m  1  >  m2  >  1,  we  have 

G(m)  =  Z(0,m-1)  (Theorem  3.1) 

=  Z(0,  m0  -  1)  +  Z(m0,  m  -  1) 

>  Z(0,  m0  -  1)  +  Z(0,  mi  -  1)  +  Z(0,  m2  -  1)  +  mi  +  2m2  (Lemma  3.5) 

=  G(mo)  +  G(mi)  +  G(m2)  +  m  -  m0  +  m2  (Theorem  3.1).  ■ 

Corollary  3.2  G(m)  —  mlog3  m  if  m  —  3/  for  some  l. 

Proof  Use  Definition  3.2  and  inductive  proof  on  m.  ■ 

It  turns  out  that  G(m)  exactly  captures  the  quantity  of  interest. 

THEOREM  3.2  e3 (m.n)  —  G(m )  for  m  <  3n. 
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Proof  Since  Gz,n  contains  three  composite  subcubes  of  type  G3,n-\t  assume  that  mo,  m\  and 
m2  nodes  are  chosen  in  the  0th,  Is*  and  2nd  composite  subcubes,  respectively.  By  Property 
1.5, 


e3(0, n)  -  e3(l ,n)  =  0 

e3(m,n)<  max  {e3(m0,  n  -  1)  +  e3(mi,  n  -  1)  +  e3(m2,  n  -  1) 

m,=m 

+m  -  max{mo,  mi,  m2}  +  min{mo,  mi,  m2}}. 

Similar  to  Theorem  2.2,  we  can  prove  by  induction  on  m  that  e3(m,n)  <  G(m),  using 
the  above  recursive  definition  of  e3(m,n),  inductive  hypothesis,  and  Corollary  3.1. 

Also  similar  to  Theorem  2.2,  a  subgraph  S ^  of  m  nodes  with  G(m)  internal  edges  can  be 
constructed  by  allocating  [y]  nodes  into  each  of  the  first  m  mod  3  composite  subcubes  and 
LyJ  nodes  into  each  of  the  remaining  composite  subcubes;  the  same  method  is  then  used 
recursively  to  allocate  the  nodes  in  each  composite  subcube.  ■ 

4  The  case  k  =  4 

Similar  to  the  previous  two  sections,  to  determine  e4(m,  n )  for  G\,n  we  will  have  to  do  some 
preliminary  work.  The  following  four  lemmas  concern  additional  properties  of  function  W. 
Their  proofs  can  be  found  in  Appendix. 

Lemma  4.1  VF(0,4i  -  1)  =  4W(0,  i  —  1)  +  Ai  for  i  >  1. 

Lemma  4.2  W{0,Ai)  =  W(0,  i)  +  3W(Q,i  -  1)  +  Ai  for  i  >  1. 

Lemma  4.3  1T(0,  Ai  +  1)  =  2W(0,  i)  +  2IT(0,  i  -  1)  +  Ai  +  1  for  i  >  1. 

Lemma  4.4  W(0,Ai  +  2)  =  3kP(0,  i)  +  W(0,i  -  1)  +  Ai  +  2  for  i  >  1. 

We  next  define  a  recursive  function  H  and  show  that  it  is  the  same  function  as  F  defined 
in  Section  2. 

Definition  4.1 
H(  0)  =  H(  1)  =  0; 

H(m)  =  (m  mod  4)^f([y])  +  (4  -  m  mod  A)H([1f\)  +  m  -  [y]  +  [fj  for  m  >  2. 
Theorem  4.1  H{m)  =  W(0,m  -  1)  for  m>  1. 

Proof  Similar  to  the  proof  of  Theorem  2.1.  In  the  inductive  step,  we  consider  four  cases: 
m  =  Ai,  m  —  Ai  +  1,  m  —  Ai  +  2,  and  m  =  Ai  +  3,  and  use  Lemmas  4.1,  4.2,  4.3,  and  4.4  in 
the  four  cases,  respectively.  I 

Corollary  4.1  H(m )  >  H (m0)  +  H (mi)  +  H(m2)  +  H (m3)  +  m  -  max{ro0,  mi,  m2,  m3}  + 
min{m0,  mi,  m2,  m3}  for  m0  +  mi  +  m2  +  m3  =  m. 
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Proof  If  at  least  three  of  mo,  mi,  m2  and  m3  are  0,  the  inequality  holds  trivially.  If  only 
two,  say  m.2  and  m3,  are  0,  assuming  that  mo  >  m  1  >  1,  the  derivation  is  almost  identical 
to  the  same  case  in  the  proof  of  Corollary  2.1  except  here  we  use  H  instead  of  F.  If  at  most 
one  of  mo,  mj,  m2  and  m3  is  0.  assuming  that  mo  >  mj  >  m2  >  m3,  we  have 

H(m)  =  F(m )  (Theorems  2.1  and  4.1) 

>  F(m0  +  mi)  +  F(m2  +  m3)  +  m2  +  m3  (Corollary  2.1) 

>  F(m0)  +  F(mi)  +  F{m2)  +  F(mo)  +  mi  +  m2  +  2m3  (Corollary  2.1) 

=  H(m0)  +  H(mi)  +  H(m2)  +  H(m3)  +  mi  +  m2  +  2m3  (Theorem  4.1).  ■ 

Corollary  4.2  H(m)  —  mlog4  m  if  m  —  4l  for  some  l. 

Proof  Use  Definition  4.1  and  inductive  proof  on  m.  ■ 

THEOREM  4.2  e4(m,  n)  =  H(m)  for  m  <  4n. 

Proof  Since  G4>n  contains  four  composite  subcubes  of  type  G'4!„_i,  assume  that  mo,  mi,  m2 
and  m3  nodes  are  chosen  in  the  0t,!,  lsf,  2nd  and  3rd  composite  subcubes,  respectively.  By 
Property  1.5, 

e4(0,n)  =  e4(l,  n)  =  0 

e4 (m,  n)  <  max  {e4(m0.n  -  1)  +  e4(mi,n  -  1)  +  e4(m2,n  -  1)  +  e4(m3,n  -  1) 

+m  -  max{m0, mi.m2,m3}  +  min{mo, mi, m2, m3}}. 

Similar  to  Theorem  2.2,  we  can  prove  by  induction  on  m  that  e4(m,n)  <  H(m),  using 
the  above  recursive  definition  of  e4(m,n),  inductive  hypothesis,  and  Corollary  4.1. 

Also  similar  to  Theorem  2.2,  a  subgraph  S of  m  nodes  with  H(m )  internal  edges  can  be 
constructed  by  allocating  nodes  into  each  of  the  first  m  mod  4  composite  subcubes  and 
LfJ  nodes  into  each  of  the  remaining  composite  subcubes;  the  same  method  is  then  used 
recursively  to  allocate  the  nodes  in  each  composite  subcube.  ■ 

5  The  case  k  >  5 

Given  that  essentially  the  same  approach  defines  the  structure  of  optimal  subgraphs  for  three 
successive  values  of  k,  one  might  suspect  a  general  pattern  for  all  k.  It  turns  out  that  this 
is  not  the  case  and  that  for  k  >  5  the  decomposition  that  once  defined  optimal  subgraphs 
now  defines  suboptimal  ones.  Consider  the  example  of  k  =  5,  m  =  6.  If  we  partition  in  one 
dimension  into  one  subgraph  of  two  nodes  and  four  subgraphs  of  one  node  each  we  achieve 
six  internal  edges  (a  ring  of  five  nodes,  with  one  extra  node  hanging  off  the  ring).  However, 
it  is  possible  to  embed  the  six  node  graph  illustrated  in  Figure  2  into  G s,„,  and  achieve  seven 
internal  edges.  An  ability  to  embed  subgraphs  of  G2,„  into  Gk,n  turns  out  to  be  what  is 
needed  to  characterize  the  optimal  subgraphs  of  Gjt,n  with  m  nodes,  when  k  >  5  and  m  <  2n. 
To  prove  this,  we  need  the  following  theorem. 
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Theorem  5.1  F(m )  >  ZiLo*  TXmi)+m-maxo<;<A-i{TO;}+min0<;<fc-i  {to;} /or  ZLo  m«  = 
to. 


Proof  Assume  that  mo  >  m-i  >■■■  >  >  0.  Let  /  be  the  smallest  index  such  that 

ZLo  mi  >  *2  •  Clearly,  Z;=o  mi  <  ^  and  ZfT*  mi  >  y  •  This  also  implies  that  l  <  k  —  l.  So 


Z  A-l  /  Ar— 1 

T\m)  >  T(]Zto;)  +  .F(  ^  m;)  +  min{]P  to;,  ^  to;}  (Corollary  2.1) 

i=0  i=Z+l  t'= 0  i'=Z+l 

k— 1 

>  ])Z  F(to;)  +  A  +  B  +  C  (Corollary  2.1  repeatedly), 

t=0 


where 


and 


/ 

A-l 

A  =  min{^Z  m;, 

i=0 

i=Z+l 

Z-l 

z 

B  —  y~}min{TO;, 

i'=0 

i=*+i 

k- 2 

A:— 1 

C  =  ^  min{m;,  to;} 

i~l- 1-1 

j=»+i 

Next,  we  wish  to  prove  that  A  +  B  +  C  >  m-m0  +  to^_  i.  Since  ZLo  mi  >  y,  A  = 
Z^Tj+i  TOi-  Since  /  <  |  and  A:  >  5,  l  +  l  <  k-2.  So  there  is  at  least  one  term  in  C.  Therefore, 
C  >  How  large  is  B ?  If  l  =  0,  then  5  =  0  and  A+B+C  >  J2i=i  mi+rrik-i  =  to— to0  + 

TO/t_ i.  If  /  =  1 ,  then  5  =  mi  and  A  +  B  +  C  >  Zf=T2  to;  +  toi  +to^_i  =  m  —  m0  +  mk-\.  Now 
assume  that  l  >2.  B  must  have  at  least  two  terms.  If  to h  <  Zz=/i+i  mi  for  all  h  =  0, ...,/  —  2, 
then  5  =  Z;=q  +  raz  and  A  +  B  +  C  >  Z;=z+i  to;  +  Z;=q  TOj  +  toz  +  to^-j  >  m-m0  +  TO;t-i. 
If  there  is  h  in  [0,  l-  2]  such  that  to/j  >  Zl=/i+i  TOi  (choose  the  smallest  h  if  there  is  more  than 
one),  then  B  >  +  Z;=/i+i  m;  and  A  +  B  +  C  >  Z.t^+i  m;  +  Z£To  to;  +  ZUft+i  mi  + 

mk-\  >  to  -  toq  +  to/j_i  .  ■ 


Theorem  5.2  efc(m,  n )  =  F(m )  for  to  <  2”  and  k  >  5. 


Proof  Since  contains  k  composite  subcubes  of  type  ,  assume  that  to;  nodes  are 

chosen  in  the  ith  composite  subcube  for  0  <  *  <  k  —  1.  By  Property  1.5, 


ek(0,n)  =  ejt(l,  n)  =  0; 


k—l 


efc(TO,  n)  < 


max  efc(TO;,  n  —  1)  +  to  — 

VZ  mi—m  ;_0 


max  {to;}  +  min  {to;}} 

0<i'<fc-l  0<i'<fc-l  1  J  J 


Similar  to  Theorem  2.2,  we  can  prove  by  induction  on  to  that  ek(m,n )  <  F(m),  using 
the  above  recursive  definition  of  ek(m,n ),  inductive  hypothesis,  and  Theorem  5.1. 
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fdim  2 


•  -  1 

1 

(/-l)2 

1 

1 

1 

> 

1 

i 

1 

/(/-l) 

dim  1 


Figure  3:  Construction  procedure  for  ^(m) 

Also  similar  to  Theorem  2.2,  a  subgraph  5^  of  m  <  2"  nodes  with  F(m)  internal  edges 
can  be  constructed  by  allocating  [y]  nodes  into  the  0th  composite  subcube  and  [yj  nodes 
into  the  1st  composite  subcube;  the  same  method  is  then  used  recursively  to  allocate  the 
nodes  in  each  composite  subcube.  ■ 

What  then  of  subgraphs  of  size  m  >  2"?  For  this  case  we  assume  that  either  k  is  so  large 
relative  to  m  that  an  optimal  subgraph  cannot  include  wrap-around  edges,  or  that  the  graph 
of  interest  is  a  mesh  (without  wrap-around  edges)  whose  local  structure  is  like  that  of  Gk,n ■  In 
other  words,  we  now  also  consider  multi-dimensional  rectangular  meshes,  structures  we  will 
call  7i- D  meshes.  Intuition  tells  us  that  the  maximum  number  of  internal  edges  ek(m,n )  can 
be  reached  when  the  m  nodes  are  placed  as  tightly  as  possible  to  form  a  “cubish”  polyhedron. 
In  the  remainder  of  this  section,  we  shall  prove  that  our  intuition  turns  out  to  be  correct. 

In  any  dimension  i,  a  subgraph  of  m  nodes  can  be  partitioned  into  layers ,  each  of  which 
contains  nodes  with  the  same  coordinate  in  dimension  i.  Furthermore,  there  may  be  edges 
(legs)  between  adjacent  layers.  We  give  the  following  definition  of  a  cubish  polyhedron. 

Definition  5.1 

For  any  m  >  2.  there  exist  I  >  2  and  1  <  i  <  n  such  that  /I-1(/  -  <  m  < 

l1  [I  —  l)"-1.  Let  6  —  m  —  /!-1(/  -  l)n-t+1.  The  n-D  cubish  polyhedron  of  m  nodes  in  Gk,n> 
denoted  as  Cn{m),  is  defined  recursively  as  follows. 

*  Ci(m)  is  a  line  of  m  nodes. 

•  To  construct  Cn(m).  we  start  with  an  l  x  •  •  ■  x  /  x  (/  —  1)  x  •  •  •  x  (/  —  1)  n-D  mesh.  For 

- - ^ '  » - - - v - - - ' 

1  n— i+1 

the  remaining  6  nodes,  we  construct  an  (n  -  1  )-D  layer  Cn-\{b)  and  add  it  on  the  top 
of  the  n-D  mesh  in  dimension  i. 

The  above  procedure  of  constructing  C„(m )  is  very  much  like  making  a  ball  of  yarn. 
The  idea  is  to  fill  in  each  side  (dimension)  with  yarn  (nodes),  one  side  (dimension)  at  a 
time.  Figure  3  illustrates  the  construction  procedure  for  £2(777),  and  Figure  4  illustrates  the 
procedure  for  C3 (m).  Let  en{m)  be  the  internal  edge  count  in  a  cubish  polyhedron  Cn(m). 
Obviously,  en(m)  =  [e„_j(6)  -I-  5]  +  en(m  —  tf). 
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Figure  4:  Construction  procedure  for  Cz{m) 


Figure  5:  Rearrange  Sm  (in  Gk, 2)  without  decreasing  e(Sm) 


THEOREM  5.3  Cn(m)  has  the  maximum  internal  edge  count  among  all  subgraphs  Sm  of  m 
nodes  in  Gk,n  (or  in  n-D  meshes),  when  the  warp-around  edges  can  be  discounted. 

Proof  We  prove  by  induction  on  n.  When  n  =  1,  the  claim  is  trivially  true.  Assume  that 
the  claim  holds  true  for  n  -  1.  Now  consider  the  case  of  n.  Let  Sm  be  any  subgraph  of  m 
nodes  with  e(Sm )  internal  edges  in  Gk,n-  We  wish  to  prove  that  e(Sm)  <  en(m). 

We  can  view  Sm  as  having  several  (n  —  1)-D  layers  of  nodes  stacked  on  each  other  in  a 
certain  dimension.  Rearrange  the  order  of  the  layers  by  sizes  (node  counts)  and  within  each 
layer  rearrange  the  nodes  into  an  (n  —  1)-D  cubish  polyhedron.  See  Figure  5  for  an  example 
(The  numbers  in  the  figure  are  the  sizes  of  the  layers).  If  after  this  rearrangement  there  are 
h  layers  and  st-  is  the  size  of  the  ith  layer  with  «i  <  s2  <  ■  •  •  <  Sfc,  then  by  the  inductive 
hypothesis  we  have 

e(5m)  <  [en_i(.Sj)  +  si]  +  [671-1(^2)  +  ^2]  + - b  [zn-\(sh-\ )  +  +  e„_i(sfe). 

Note  that  Si  +  S2  + - f-  -s^-i  is  the  number  of  edges  (legs)  between  adjacent  layers. 

We  have  a  few  observations  about  the  new  subgraph  obtained.  First,  layers  in  each 
dimension  (not  just  the  dimension  chosen  in  the  rearrangement)  are  stacked  on  each  other 
by  sizes.  Second,  h  >  l.  Assume  that  h  <  l  —  1  for  all  dimensions.  We  must  have  m  < 
(/  —  1)",  which  is  impossible.  Third,  Si  <  V~l(l  —  \)n~l.  Suppose  not.  We  must  have 
m  =  si  + ) -  Sh>  hsi  >  Is-i  >  ll(l  —  !)"“%  which  is  impossible. 
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Let  us  go  back  to  the  induction  step,  in  which  we  assume  that  e„_i(m)  is  maximum  and 
wish  to  prove  that  en(m)  is  maximum.  We  need  another  induction  on  m  to  prove  this.  When 
m  =  1,2,  en(m )  is  obviously  maximum.  Assume  that  en(j)  is  maximum  for  j  <  m  —  1.  Now 
consider  the  case  j  =  m.  We  know  by  the  inductive  hypothesis  that 


e(Sm)  <  [cn-i(si)  +  si]  +  en(m  -  si). 


By  Definition  5.1,  we  know  that  Cn(m  -  S)  is  in  fact  an  n-D  mesh  with  —  l)n-,+1 

nodes.  Cn(m  —  6)  can  also  be  viewed  as  having  /  (or  /  —  1  if  i  =  1)  layers  stacked  on  each 
other,  where  each  layer  is  an  (n  —  1)-D  mesh  and  has  L  nodes.  Clearly, 


L  = 


(/  -  l)"-1  if  i  =  1; 

V~2 (/  -  !)«-*+»  if  i  >  2. 


We  can  show  that  sj  <  L  +  S.  Suppose  not.  We  must  have  m  >  hs\  >  l-S\  >  IL  +  1ft  > 
IL  +  8  >  m,  which  is  impossible.  To  continue,  we  consider  two  cases. 

Case  1.  si  <  6.  We  must  have  -  l)n_!+1  <  m  -  s\  <  ll(l  -  l)"-1.  Let  m  -  Si  = 
V-\l  -  l)n~'+1  +  S'.  Then  *i  +  S'  =  6.  So 


e„(m  -  5i )  =  [cn-i (S')  +  S']  +  en{V~\l  -  1 )"-'+1) 


and 


Therefore, 


+  en-l  (S')  <  en_i (S). 


e(Sm)  <  [e„-i(si)  +  si]  +  e„(m  -  sx) 

=  [en-i(5i)  +  Si]  +  [en_i(^)  +  <^y]  +  en(/'  1(/  -  l)n  ,+1) 

<  len_1(S)  +  S]  +  en(/t-1(l-ir-’+1) 

=  en(m). 


Case  2.  Si  >  S.  Since  Si  <  L  +  6.  we  must  have  (/'  —  1)L  <  m  —  $i  <  TT,  where  l'  =  l  —  1 
if  i  =  1  and  l'  =  l  if  i  >  2.  Let  m  -  Si  =  (/'  -  1  )L  +  S',  where  S'  <  L.  Then  $i  +  S'  =  L  +  S. 
So 

en(m  -  si)  =  [e„_i (S')  +  S']  +  en((l'  -  1  )L). 

Therefore, 


e{Sm)  <  [en_i(si)  +  si]  +  en(m-si) 

=  t^n— i(si)  +  -5i]  +  [en_i(^)  +  tf7]  +  en((l'  —  1  )L). 


On  the  other  hand,  we  have 

en(m)  =  [e„_ i(^)  +  ^]  +  [en_i(Z)  +  Z.]  +  e„((// —  l)Zz). 
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To  show  that  e(Sm )  <  e„(m),  all  we  need  to  prove  is  that  for  si  +  S'  =  L  +  S, 

®n— l(^l)  T  ^n— 1(^  )  ^  6n— l(-L)  T  ^n-l(S)- 

The  inequality  is  trivially  true  when  Si  =  L.  Let  us  consider  the  following  subcases. 

Subcase  2.1.  S]  <  L.  We  will  prove  by  yet  another  induction  on  dimension  n  —  1  that 
e„_ i(-Si)  +  e„_i(£')  <  e„_i(T)  +  en_i(^),  where  Si,6’  <  L  and  +  =  L  +  S.  When  n  —  1  =  1, 

it  is  a  trivial  case.  Assume  that  the  inequality  holds  for  dimension  n  —  2.  Now  consider  the 
case  of  n  -  1.  Without  loss  of  generality,  assume  that  Si  >  S'  (The  case  -sj  <  S'  is  symmetric). 
Initialize  A  and  B  to  be  Cn-i(-Si)  and  Cn_i(£'),  respectively.  The  node  count  in  A ,  denoted 
as  |  A |,  is  then  sj,  and  \B\  is  S'.  Consider  A  as  a  cubish  polyhedron  of  several  (n-2)-D  layers 
of  size  L'  each  plus  one  more  layer  of  a  <  L'  nodes  and  a  legs  on  the  top,  and  B  as  a,  cubish 
polyhedron  of  several  (n  —  2)-D  layers  of  size  L"  each  plus  one  more  layer  of  b  <  L"  nodes 
and  b  legs  on  the  top.  Since  |A|  >  |5|,  A  completely  includes  B.  So  V  >  L" .  We  next  apply 
the  following  step  to  move  nodes  from  B  to  A.  If  there  is  a  layer  in  B  with  size  no  greater 
than  L  —  |A|,  move  the  layer  together  with  its  legs  to  A  and  rearrange  two  polyhedrons  into 
cubish  polyhedrons  again  (Note  that  after  the  move  A,  B ,  L ',  L",  a,  and  b  are  updated).  It  is 
clear  that  this  step  does  not  decrease  the  total  edge  count  in  the  two  polyhedrons.  Apply  the 
above  step  until  for  any  layer  in  B  its  size  is  larger  than  L  —  |A|.  We  must  have  L  -\A\  <  L' 
and  a  +  b  >  L1.  Since  a,  b  <  L',  by  the  inductive  hypothesis, 

€n- 2(a)  +  Zn-2{b)  <  Cn_2(i/,)  +  e„_2(fl  +  b  -  L'). 

Removing  the  top  layer  of  a  nodes  and  the  top  layer  of  b  nodes  from  A  and  B ,  respectively, 
and  adding  a  layer  of  L'  nodes  and  a  layer  of  a  +  b  -  L'  nodes  to  A  and  B,  respectively,  we 
get  |A|  =  L  and  |jB|  =  S.  So 

€n-i(si)  +  en-i(8')  <  e„_i (L)  +  en_i(<5). 

Subcase  2.2.  si  >  L.  Assume  that  =  L  +  g,  then  S  —  S'  +  g.  We  have 

en-i(si)  =  [en-2(s0  +  ff]  +  e„_i  (X). 

We  can  show  that  S'  >  (l  -  l)g.  Suppose  not.  We  must  have  m  =  {V  —  1  )L  +  S'  +  .si  < 
(l'  —  1)L  +  (l  —  l)g  +  L  +  g  <  l(L  +  g)  =  ls\  <  m,  which  is  impossible.  We  know  that 
S'  <  L.  If  all  dimensions  in  Cn-\(8')  have  at  least  l  layers,  then  S'  >  ln~2(l  —  1)  >  L,  which 
is  impossible.  So  there  must  be  a  dimension  in  Cn-i(Sr)  which  has  fewer  than  l  layers.  Since 
S'  >  (l  —  1  )<7,  there  must  be  a  layer  with  at  least  g  nodes.  So  we  can  move  the  layer  of  g 
nodes  together  with  its  legs  from  Cn_i(si)  to  Cn-\(8')  safely  and  get 

[en-2(fl0  +  d]  +  ^n-i(S')  <  e„_i (S'  +  g). 

Therefore, 

en-i(-si)  +  ^n-i(S')  =  en_i(T)  +  [en-2(g)  +  g]  +  en-i(Sr) 

<  en-i(L)  +  en_i(<5/  +  g) 

—  €n—l  (B)  T  Cn— 1(^)-  ® 
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6  Applications  to  Partitioning 


The  results  so  far,  besides  having  theoretical  interest,  have  practical  applications  to  par¬ 
titioning.  There  are  different  ways  in  which  k-ary  n-cubes  are  appropriate  descriptions  of 
parallel  computations.  One  way  is  when  at  the  lowest  level  the  communication  pattern  of  the 
computation  is  that  of  a  k- ary  n-cube,  e.g.,  some  mesh-oriented  computation  with  periodic 
boundary  conditions.  Another  is  when  the  communication  patterns  reflect  a  k- ary  n-cube 
because  the  computation  is  about  a  k-ary  n-cube.  For  instance,  the  computation  may  be  a 
direct-execution  simulation  of  an  application  running  on  an  architecture  whose  communica¬ 
tion  network  is  Gk,n  [1,  5].  We  partition  the  simulation  in  order  to  balance  the  simulation 
workload  and  minimize  communication  overheads.  Another  instance  is  when  the  computation 
is  written  as  though  it  executes  on  all  nodes  of  an  k- ary  n-cube  architecture,  but  the  program 
is  to  be  “folded”  onto  fewer  processors,  with  subgraphs  defined  by  the  folding  reflecting  a  set 
of  tasks  that  are  multi-tasked  on  one  node  of  an  actual  machine  [7]. 

To  illustrate  these  points  we  show  how  our  results  may  be  used  in  the  context  of  branch- 
and-bound  algorithms  for  partitioning.  Our  object  here  is  not  to  propose  the  specifics  of 
such  an  algorithm  nor  study  its  performance.  The  ability  to  construct  lower  bounds  on 
communication  costs  based  only  on  subgraph  node  size  is  one  that  can  be  used  in  a  variety  of 
branch-and-bound  formulations,  and  for  a  variety  of  partitioning  problem  formulations.  We 
will  illustrate  its  use  in  one  specific  case. 

The  results  can  also  be  used  to  show  the  optimality  of  some  curiously  shaped  partitions, 
an  example  of  this  application  is  shown. 

6.1  Lower  Bounding  in  Branch-and-Bound 

Consider  a  data  parallel  computation  whose  communication  structure  can  be  viewed  as  a  k- 
ary  n-cube,  or  related  structure.  The  nodes  of  the  graph  are  weighted  individually  to  reflect 
computation  costs,  the  edges  of  the  graph  are  also  weighted  to  reflect  communication  costs.  It 
is  assumed  that  communication  between  co-resident  nodes  is  free,  alternatively,  with  minor 
modifications  one  could  model  such  internal  communication  with  smaller — but  nonzero — 
costs.  We  wish  to  find  a  rectilinear  partitioning  [9]  of  the  graph  into  P  subgraphs  such  that 
the  bottleneck  cost  (the  maximum,  among  all  subgraphs,  sum  of  the  total  node  weights  and 
the  total  external  edge  weights  of  any  subgraph)  is  minimized.  A  rectilinear  partition  is 
one  in  which  the  separating  cuts  are  all  hyperplanes  of  the  form  a’,  =  c,j,  a  constant.  A 
rectilinear  partition  of  an  8x8  mesh  is  illustrated  in  Figure  6.  Rectilinear  partitions  preserve 
the  nearest-neighbor  communication  structure  of  mesh-like  communication  patterns,  as  well 
as  having  other  desirable  properties  [9]. 

Our  earlier  work  on  rectilinear  partitioning  established  that  for  dimensions  larger  than 
two,  the  problem  of  finding  an  optimal  partition  is  intractable.  Furthermore,  that  work 
did  not  explicitly  include  communication  costs.  The  results  in  this  paper  can  be  used  in 
branch-and-bound  algorithms  [3]  for  finding  rectilinear  partitions,  as  we  now  show. 

A  node  in  the  branch-and-bound  search  tree  reflects  a  set  of  cuts  already  made,  the  initial 
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Figure  6:  Rectilinear  partition  of  an  8  x  8  mesh 

node  is  empty.  The  children  of  a  node  reflect  various  ways  of  choosing  one  additional  cut. 
If  there  are  c  cuts  to  be  made,  the  search  tree  has  depth  c  +  1.  Every  solution  is  a  leaf  of 
the  search  tree.  We  assume  that  the  relative  positioning  of  the  cut  associated  with  a  level 
is  known  a  priori,  e.g.,  the  cut  in  the  third  dimension  whose  cut  coordinate  is  fifth  smallest. 
Selecting  the  cut  order  is  part  of  the  branch-and-bound  solution,  but  our  focus  here  is  on  the 
lower  bounding  function  needed  for  the  branch-and-bound  approach. 

For  every  node  N  in  the  search  tree  we  associate  a  function  bnd(N),  that  provides  a  lower 
bound  on  the  bottleneck  cost  of  any  solution  rooted  at  that  node.  bnd(N)  can  be  used  to 
direct  the  search  in  different  ways,  e.g.,  in  choosing  the  next  node  to  explore  or  in  pruning  the 
search  beyond  that  node  because  a  known  solution  is  better  than  any  solution  rooted  at  N. 
We  are  interested  in  defining  an  easily  computed  function  bnd(N).  Each  node  N  reflects  the 
partitioning  of  the  graph  into  some  number  of  regions;  furthermore,  under  our  assumptions 
we  know  how  many  further  divisions  will  be  applied  to  each  region.  Consider  a  region  R ,  to 
be  further  divided  into  s  subregions,  suppose  that  the  number  of  nodes  in  region  R  is  r,  that 
the  sum  of  all  node  weights  in  R  is  VFr,  and  that  the  edge  weights  of  all  edges  with  at  least 
one  node  in  R  are  sorted  in  list  E  in  non-decreasing  order. 

We  wish  to  construct  a  lower  bound  lb(R )  on  the  minimal  bottleneck  cost  due  to  any 
possible  subdivision  of  R  into  s  subregions.  The  method  we  use  relies  on  an  ability  to  compute 
sizes  of  subregions  mi,  m2, . . . ,  ms,  m,  >  1  for  all  i,  and  J2i= 1  mi  —  Ti  such  that  ^*=1  C(m,) 
is  minimized,  where  C(m,)  is  the  cost  (external  edge  count)  of  an  optimal  subgraph  with  m; 
nodes.  Note  that  since  all  nodes  in  a  k- ary  n-cube  have  the  same  degree  d,  which  is  n  for 
k  =  2  and  2 n  for  k  >  3,  we  have  that  C(rrii)  =  dm;-2efc(mj-,  n ).  Solution  to  this  minimization 
problem — even  when  modified  to  include  a  constraint  m;  <  B  for  all  i,  is  straightforward 
using  dynamic  programming. 

The  bound  construction  of  lb(R)  has  three  phases.  First,  we  compute  the  vector  m  = 
(mi,...,ms)  that  minimizes  C(m,);  this  reflects  an  idealized  assignment  of  numbers 
of  graph  nodes  to  processors  in  such  a  way  that  the  total  number  of  edges  cut  (summed 
over  all  processors)  is  minimized.  Second,  we  compute  a  vector  w  whose  ith  component 
(wi)  is  the  sum  of  the  weights  of  the  first  C(m,)  edges  in  E.  w  reflects  lower  bounds  on 
communication  costs  under  assignment  m.  Without  loss  of  generality  suppose  that  u>i  is  the 
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(a)  Slack  is  less  than  total  computation  (b)  Slack  exceeds  total  computation 

Figure  7:  Computation  of  lower  bound  on  bottleneck  cost 


largest  component.  We  define  the  slack  of  w  as 

5 

slack(w)  =  -  t Vi). 

i= 2 


Third,  we  consider  the  following  two  cases. 

The  first  case  of  interest  is  when  slack(w)<  Wr.  This  means  that  if  we  treat  the  total 
computational  workload  IT/?  as  divisible  into  arbitrary  pieces,  we  can  give  each  processor 
except  the  first  enough  workload  to  bring  its  total  cost  up  to  uq,  and  still  have  workload 
remaining.  The  remanent  may  be  divided  evenly  among  the  s  processors.  This  is  illustrated 
in  Figure  7(a).  So 

lb(R)  =  + 

The  correctness  of  the  bound  is  evident  by  the  fact  that  the  total  load  (sum  of  computation 
and  communication)  is  minimized,  and  that  no  processor  is  ever  idle. 

The  second  case  occurs  when  slack(w)>  Wr.  as  illustrated  by  Figure  7(b).  In  this  case  the 
bottleneck  is  entirely  communication  induced,  and  the  maximum  number  of  nodes  assigned 
to  a  processor  must  be  driven  down.  This  may  increase  the  total  communication  cost,  but 
will  decrease  the  bottleneck  cost.  To  reduce  the  bottleneck  cost  we  constrain  the  assignment 
m,  <  B  for  all  for  each  B  considered  we  may  compute  the  slack  of  the  corresponding 
weight  vector,  and  determine  whether  it  exceeds  Wr.  Using  a  binary  search  on  B  we  may 
find  the  least  value  B *  such  that  the  corresponding  slack  exceeds  Wr.  Let  w  =  (uq, . . .  ,ws) 
and  w'  =  (u/j, . . .,  w's )  be  the  weight  vectors  derived  from  using  B*  —  1  and  B*  as  constraints, 
respectively.  Then  we  make  the  lower  bound  to  be 


lb(R)  =  min{ 


X)- 


We  need  not  consider  any  bottleneck  derived  from  using  B  >  B* ,  since  the  bottleneck  cost 
is  monotone  non-decreasing  in  maxim,},  which  is  monotone  non-decreasing  in  B.  We  need 
not  consider  any  bottleneck  derived  from  using  B  <  B*  —  1,  since  in  this  case  no  processor  is 
idle,  and  the  total  communication  cost  is  at  least  as  large  as  that  derived  from  using  B*  —  1. 

Clearly  the  solution  of  dynamic  programming  equations  is  the  most  expensive  part  of 
this  bound  construction.  It  may  be  avoided  by  using  lower  bounds  on  external  edge  count 
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function  C  that  have  concave  closed  form  expression.  Such  bounds  have  been  developed  in 

[10]: 


Bk(mi,n ) 


< 


TO;n  —  m;  log  TTli 

2m in  —  mi  log  m{ 

o  i  (n— 1)/ 

2ra,n  —  n(mi  —  ’ 


71 


) 


for  k  =  2,  mi  <  2”; 
for  k  >  2,  mi  <  2”; 
for  k  >  2,  mi  >  2n. 


Since  Bk(mi,n)  is  concave  in  mt ,  the  theory  of  majorization  [6]  tells  us  that  to  mini- 
mize  Ylt=i  n)  subject  to  1  <  mi  <  B  and  Y^i=imi  =  r  we  assign  mi  —  B  for 

i  =  1, 2, . . .,  [(r  —  s)/(B  -  1)J,  with  =  (r  -  5)  mod  (5-1)  +  1,  and  mt-  =  1 

for  the  remainder. 

The  procedure  above  shows  how  to  bound  from  below  the  potential  least  bottleneck  cost 
for  each  region  reflected  by  node  N.  Applying  this  method  to  each  such  region,  we  define 
bnd(N)  as  the  greatest  of  these  lower  bounds,  i.e., 


bnd(N)—  max{/6(5)}. 

It  should  be  noted  that  for  a  given  number  of  processors  P,  and  a  given  total  workload 
Wr,  the  assignment  problem  whose  minimized  bottleneck  cost  is  least  is  not  necessarily  one 
where  the  workload  is  spread  evenly.  For  instance,  consider  an  8  x  8  torus  to  be  partitioned 
into  two  regions.  If  each  node  has  weight  4  and  each  edge  has  weight  1,  then  the  optimal 
solution  is  to  bisect  the  graph  into  two  equal  pieces,  at  a  cost  of  4  x  32  +  8  =  136.  However,  the 
graph  that  weights  one  node  by  128  and  all  other  nodes  by  128/63  is  optimally  partitioned  by 
isolating  the  heavy  node,  at  a  cost  of  128  +  4  =  132.  Realization  that  minimized  bottleneck 
costs  need  not  be  associated  with  evenly  spread  workload  (and  equi-partitions)  leads  us  to 
the  careful  construction  of  bnd(N)  given. 


6.2  Identification  of  Optimal  Partitions 

Another  application  of  our  results  is  to  identify  optimal  partitions  (with  respect  to  the  bot¬ 
tleneck  metric),  even  when  those  partitions  are  not  entirely  regular.  Consider  the  problem  of 
partitioning  Gs,2  (an  8x8  torus)  into  13  subgraphs,  assuming  that  all  nodes  have  common 
computation  weight  w  and  all  edges  have  unit  communication  cost.  The  problem  clearly  does 
not  divide  evenly.  The  minimal  cost  to  a  processor  of  having  m  nodes  is  wm  +  C(m),  where 
C(m),  the  external  edge  count  of  the  optimal  subgraph  with  m  nodes,  is  4m  —  2e$(m,  2);  note 
that  the  cost  function  increases  monotonically  in  m. 

The  processor  with  the  most  nodes  assigned  will  have  at  least  [64/13]  =  5  nodes.  The 
optimal  subgraph  of  G 8,2  with  5  nodes  is  a  square,  with  an  attached  singleton  node.  As 
illustrated  in  Figure  8,  it  is  possible  to  nearly  tessellate  Gs, 2  with  this  optimal  subgraph,  the 
only  exception  being  one  subgraph  (the  center  square)  which  is  a  subgraph  itself  of  the  optimal 
subgraph.  The  optimality  of  this  partition  derives  from  the  fact  that  wm  +  C(m )  is  monotone 
non- decreasing  in  m,  so  that  the  bottleneck  cost  max{tumi  +  C(rai), . . . ,  wm.\z  +  C(mi3)}  is 
minimized  when  the  ra;’s  are  nearly  equal.  The  partition  shown  achieves  the  lower  bound  of 
5  w  +  C(5)  =  5u;  +  10. 
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Figure  8:  Optimal  partition  of  Gg, 2  into  13  subgraphs 

There  is  clearly  a  general  principle  at  work  here,  for  uniformly  weighted  graphs.  If  there 
are  M  nodes  to  be  assigned  to  P  processors,  then  at  least  one  processor  will  receive  m  — 
\M/P ]  nodes.  When  the  processor  cost  function  is  monotone  non-decreasing  as  a  function  of 
the  number  of  nodes  assigned  to  it,  wm  +  C(m )  is  a  lower  bound  on  the  optimal  bottleneck 
cost,  C  being  the  appropriate  minimized  function  for  communication  cost.  If  it  is  possible 
to  partition  the  graph  so  that  no  processor  has  cost  greater  than  wm  +  C(m),  then  that 
partition  is  optimal. 

7  Conclusions 

A  subgraph  of  a  k-ary  n-cube  can  be  viewed  as  having  internal  edges  and  external  edges. 
This  paper  describes  how  to  construct  subgraphs  that  are  optimal  in  the  sense  of  maximizing 
the  number  of  internal  edges,  thus  minimizing  the  number  of  external  edges,  given  m  nodes 
in  the  subgraph.  While  these  results  have  combinatorial  interest,  they  also  have  serious 
applications  to  problems  in  parallel  processing.  We  show,  for  instance,  how  to  apply  these 
results  in  the  context  of  branch-and-bound  algorithms  for  partitioning  a  k-ary  n-cube  whose 
nodes  and  edges  have  general  (positive)  weights.  Lower  bounds  lie  at  the  heart  of  any  branch- 
and-bound  algorithm,  and  our  results  provide  the  critical  means  needed  to  compute  sharper 
bounds  than  those  that  ignore  communication  overheads.  We  also  show  how  our  results  can 
be  used  to  demonstrate  the  optimality  of  certain  irregular  partitions,  k-ary  n-cubes  arise 
frequently  in  studies  of  parallel  processing.  The  results  and  applications  developed  here  help 
us  to  better  understand  these  important  graphs. 

Appendix4 

Property  1.5  In  each  ith  composite  subcube  (0  <  i  <  k  —  1)  of  type  Gk,n- 1  Gk,n<  choose 
m,  nodes,  and  define  m  —  777  *•  The  number  of  edges  with  endpoints  among  these  m 

nodes  but  in  different  composite  subcubes  is  no  larger  than  min{mo,mi}  for  k  —  2,  and  is  no 
larger  than  m  -  max0<t<*:_i  {m,}  +  mino<t<fc-i  {m,-}  for  k  >  3. 

4To  referees:  Proofs  in  this  section  have  all  been  verified  by  programs. 
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Proof  We  observe  that  if  the  k  composite  subcubes  of  type  Gk,n- 1  are  placed  from  left  to 
right,  any  node  in  one  composite  subcube  is  connected  to  exactly  one  node  in  its  neighboring 
composite  subcubes.  When  k  =  2,  it  is  trivial  that  the  number  of  edges  with  endpoints 
among  the  to  nodes  but  in  different  composite  subcubes  is  no  larger  than  min{mo,  TOj}.  Now 
consider  k  >  3.  Clearly,  the  number  of  edges  with  endpoints  among  the  to  nodes  but  in 
different  composite  subcubes  is  no  larger  than 

min{TO0,  toj }  +  min{TO1,TO2}  H - min{mfc_2,  mjt-i}  +  min{mfc_i,  to0}. 

Define  i  +  1  =  i  +  l(mod  k )  and  i  —  1  =  i  —  l(mod  k).  Let  mp  —  maxo<i<fc-i{TOt}  and 
to,  =  min0<;<fc-i{TO;}.  Place  k  pairs  (to0,  toj),  (toj,  to2),  . . .,  (mjt_2,  to*_i),  (mk-i,  m0)  in  a 
circle  clockwise.  Cut  the  circle  into  two  chains  C\  and  C2  such  that  C\  =  {(top,  m^+1), . . ., 
(mq-V  m?)>  and  C 2  =  {(m?>  mg+i)’  •••’(%_!  >mp))-  Clearly, 

i 

Y  min{TOi,TO.+1}  <  Y  mi 

(m,,mt+i)6Ci  i=p+l 

P-1 

Y  minlTO^TO.^}  <  Ymi- 

•  JeC2  «=? 

t+1 

A:— 1 

^min{mi,TO.+1} 

!=0 

=  Y  min{TOi,TO.+1}+  Y  min{TOt-,m.+1} 

(mj,m  •  )eCi  (to,, TO  .  )ec2 

t  +  1 

9  P-1 

<  Y  mi  +  H mi 

t=p+l  z— ^ 

A;— 1 

=  ^  TO,'  -  TOp  +  TO, 

j=0 

=  to-  max  {toj}  +  min  {m,}.  I 
Lemma  2.1  W(f,  2i  -  1)  =  W( 0,  i  -  1)  4-  ?  for  i  >  1. 

Proo/  We  induct  on  i.  When  i  -  1,  it  is  trivial  that  W(l,  1)  =  W(0,0)  +  1.  Assume  that 
the  equation  holds  for  <  *  —  1.  Now  consider  i. 

W{i,  2i  —  1)  =  W(i  —1,2 i  —  3)  +  w(2i  —  2)  +  w(2i  —  1)  —  w(i  —  1) 

=  W{ 0,f  —  2)  +  (i  -  1)  +  w(2i  —  2)  +  w(2i  —  1)  —  w(i  -  1) 

(Inductive  hypothesis) 


and 


Consequently, 
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=  IT(0,  z  —  2)  +  (z  —  1)  +  w(2i  -  1)  (Since  w(2i  -  2)  =  w(i  -  1)) 

=  W (0,  i  -  2)  +  (z  -  1)  +  w(i  -  1)  +  1  (Since  w(2z  -  1)  =  w(i  -  1)  +  1) 

=  VC(0,  i  —  1)  +  z.  ■ 

Lemma  2.2  W(i  +  l,2z)  =  IT'(0,  i  -  1 )  +  z  for  i  >  1. 

Proof  Straightforward  by  using  Lemma  2.1.  ■ 

Lemma  2.3  W(j,j  +  i  -  1)  >  IT(0,  z  -  1)  +  i  for  j  >  i  >  1. 

Proof  We  induct  on  z.  W'hen  i  =  1,  it  is  obvious  that  W(j,j)  >  LL(0,0)  +  1.  Assume  that 

the  inequality  holds  for  <  i  —  1.  That  is,  for  j'  >  i'  >  1  and  i'  <i  —  1, 

+  1)  >  W(0,i'-  1)  +  *'.  (1) 

An  important  implication  of  the  inductive  hypothesis  is  that  when  j'  +  i’  <  2b  for  some  b,  if 
we  replace  all  parameters  of  IT'  in  (1)  by  their  (2^  -  1)- complements,  we  have 

W{ 2b  -  i',  2b-l)  >  W(2b  -  j'  -  i ',  2b  —  j'  —  1)  +  i’.  (2) 

Now  consider  i. 

Case  1.  There  exists  2h  in  (j,j  +  i  -  1]  for  some  6,  and  j  +  i  —  1  also  has  6+1  bits  and 
starts  with  1  in  its  base-2  representation.  By  (1)  and  (2), 

VTX/,  2fc-l)  >  W(j  +  i  -  2b ,  z  —  1)  +  (2b  -  j).  (3) 

Removing  the  highest  bit  1  from  26, . .  .,j  +  i  -  1, 

W(2b,j  +  z'  —  1)  =  W{0J  +  i-2b-l)  +  (j  +  i-2b).  (4) 

Adding  (3)  and  (4), 

IT’( j,  j  +  i  —  1)  >  VT(0,  i  —  1)  +  i. 

Case  2.  There  is  no  number  equal  to  2b  in  (j.j  +  i  —  1]  for  any  6.  We  then  know  that 
j,..  .,j  +  i  —  1  must  all  have  the  same  number  of  bits,  say  6+1,  and  the  same  highest  bit 
1.  Let  Pi  ■  •  -  pt  be  the  longest  common  prefix  of  the  base-2  represent  ations  of  j, . . .,  j  +  z  -  1. 
Let  p  =  p\  ■  2b  +  •  •  •  +  pt  ■  26-<+1.  Clearly,  p  <  j  and  pi  >  1.  Removing  the  highest  t  bits 
Pi  ■  ■  • Pt  from  +  i  -  1, 

W(j,  j  +  i  -  1)  >  W(j  —  p,j  +  i  —  p  —  1)  +  i- 

Now  we  wish  to  show  that  W(j  —  p,j  +  i  —  p  -  1)  >  LT(0,  z  —  1),  or  equivalently  W(j  -  p,j  + 

i  -  p  —  1)  —  W(0,  i  —  1)  >  0.  If  j  —  p  <  i, 

W(j  —  p,  j  +  i  —  p  —  1)  —  VT(0,  i  —  1)  =  W(i,j  +  i  —  p  —  1)  —  VT(0,  j  -  p  —  1) 

>  j  —  p  (By  (1)) 

>  0. 
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If  j  —  p  >  i,  there  must  exist  2b'  in  (j  -  p,j  +  i  —  p  -  1]  for  some  b'  <  b  since  the  highest  bit 
in  j  —  p  is  not  the  same  as  the  highest  bit  in  j  +  *  -  p  -  1.  By  Case  1, 

W(j  -  p,  j  4-  i  -  p  -  1)  —  W( 0,  i  —  1)  >  *  >  0.  ■ 

Lemma  3.1  Z(0,3*  —  1)  =  3Z(0,i  —  1)  +  3?  for  i  >  1. 

Proof  We  induct  on  i.  When  i  =  1,  Z( 0,2)  =  3Z(0, 0)  +  3  =  3.  Assume  that  the  equation 
holds  for  <  i  —  1.  Now  consider  i. 

Z(0,3?  —  1)  =  Z(0, 3i  -  4)  +  z(3i  —  3)  +  z(3i  —  2)  +  z(3i  -  1) 

=  3Z(0,  i  -  2)  +  3(*  -  1)  +  z(Zi  -  3)  +  z(3i  -  2)  +  z(3i  -  1) 

(Inductive  hypothesis) 

=  3Z(0,  i  —  2)  +  3 (i  -  1)  +  z(i  -  1)  +  2(3*  -  2)  +  z(3i  -  1) 

(Since  z(3i  —  3)  =  z(i  -  1)) 

=  3Z(0,  i  -  2)  +  3 (i  -  1)  +  z(i  -  1)  +  z(i  -  1)  +  1  +  z(3i  -  1) 

(Since  2(3?'  —  2)  =  z(i  —  1)  +  1) 

=  3Z(0,  i  —  2)  +  3(?  —  1)  +  z{i  —  1)  +  2(?  —  1)  +  1  +  2(2  —  1)  +  2 
(Since  z(3i  -  1)  =  2(2  -  1)  +  2) 

=  3Z(0,i  -  1)  +  32. 1 

Lemma  3.2  Z(0 , 3?)  =  Z( 0,  ?)  +  2Z(0, 2  -  1)  +  3?  for  i  >  1. 

Proof  Straightforward  by  using  Lemma  3.1. 1 

Lemma  3.3  Z(0, 3i  +  1)  =  2Z(0,  i )  +  Z( 0, 1 —  1)  +  3?  +  1  for  i  >  1. 

Proof  Straightforward  by  using  Lemma  3.2. 1 
Lemma  3.4  Z(j,j  +  i  —  1)  >  Z( 0,  i  —  1)  +  2  for  j  >  i  >  1. 

Proof  Use  the  proof  of  Lemma  2.3,  but  change  W  to  Z  and  base-2  to  base-3.  To  be  more 
specific,  in  the  inductive  step,  consider  the  following  two  cases. 

Case  1.  There  exists  36  or  2  •  36  in  (j,j  +  2-1]  for  some  6,  and  j  +  i  —  1  also  has  6+1 
bits  and  starts  with  1  or  2,  respectively,  in  its  base-3  representation. 

Case  2.  There  is  no  number  equal  to  3b  or  2  •  3b  in  (j,  j  +  2  —  1]  for  any  6.  ■ 

Before  we  go  to  prove  Lemma  3.5,  we  need  following  claims. 

Claim  1  Z(j,j  +  i  —  1)  >  Z( 0,  i  —  1)  for  j  >  0  and  i  >  1. 

Proof  Trivia?  by  Lemma  3.4. 1 

Claim  2  Z{1  —  i,  l  —  1)  >  Z(l  —  j  —  2,  l  —  j  —  1)  +  i  for  j  >  i  >  1,  j  +  2  <  /  and  l  =  3b  or  2  ■  3b 
for  some  b. 
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Proof  Replace  all  parameters  of  Z  in  Lemma  3.4  by  their  (/  —  l)-complements.  ■ 

Claim  3  Z(l  —  i,l—  1)  >  Z(l  -  j  -  i,l-  j  -  1)  for  j  >  0,  i  >  1,  j  +  i  <  l  and  l  =  3b  or  2  •  3b 
/or  some  6. 

Proof  Replace  all  parameters  of  Z  in  Claim  1  by  their  (/  -  l)-complements.  ■ 

Lemma  3.5  Z(j,j  +  +  i2  —  1)  >  Z( 0,  ij  -  1)  +  Z(0,i2  -  1)  +  i\  +  2 i2  /or  j  >  i\  >  i2  >  1. 

Proof  We  induct  on  i/  +  i2.  When  ij  +  i2  =  2.  we  must  have  =  i2  —  1.  It  is  obvious  that 
Z(j,j  +  1)  >  Z(0,0)  +  Z(0,0)  +  3.  Assume  that  the  inequality  holds  for  <  h  +  i2  -  1.  That 
is,  for  j'  >  i[  >  i'2  >  1  and  i\  +  i'2  <  i\  +  i2  —  1, 

Z(j'J'+'i  +  *2  -  !)  >  Z(0,t/1-l)  +  Z(0,i,2-l)  +  ii  +  2t'2.  (5) 

An  important  implication  of  the  inductive  hypothesis  is  that  when  /'  +  +  i2  <  l  and  /  =  3b 

or  2  •  3^  for  some  6,  if  we  replace  all  parameters  of  Z  in  (5)  by  their  (/  —  l)-complements,  we 
have 


Z(l  1)  +  Z(l  -  4  /  -  1)  >  Z(/  -  /  -  -  4/  -  f  -  1)  +  i'j  +  2*4  (6) 

Now  consider  i\  +  i2- 

Case  1.  There  exists  2  •  3fe  in  (j.  j  +  +  i2  -  1]  for  some  b,  and  j  +  i\  +  i2  -  1  also  has 

6  +  1  bits  and  starts  with  2  in  its  base-3  representation. 

Subcase  1.1.  There  is  36  in  (j,  2  •  36).  Removing  the  highest  bit  1  from  36, . . . ,  3b  +  ii  -  1, 

Z^^  +  t!  -  1)  =  Z(0,ii  -  l)  +  *i.  (7) 

Removing  the  highest  bit  2  from  2  •  36, . . .,/  +  ?/  +  i2  -  1, 

Z(2  •  3 b,j  +  h  +  ?2  -  1)  =  Z{0,j  +  h  +  t2  -  2  •  3b  -  1)  +  2 (j  +  h  +  i2  -  2  •  36).  (8) 

Removing  the  highest  bit  1  from  3^  +  i\, . .  .,2  •  3b  -  1, 

Z(3b +  ii,2-3b -l)  =  Z(i,:3h -l)  +  (3b -h).  (9) 


Next, 

Z(j,3*-l)  +  Z(3*  +  ii,2-3*-l) 

=  Z(j,3b  —  1)  +  Z(/1,3i>  -  1)  +  (36  —  i/)  (By  (9)) 

>  Z(j  +  »i  +  t2  -  2  -  36,  *2  -  1)  +  (3&  -  h)  +  2(36  -  ;)  +  (36  -  +)  (By  (5)  (6)).  (10) 
Adding  (7),  (8)  and  (10), 

Z(j,j  +  ii  +  i2  —  1)  >  Z(0,  i\  —  1)  +  Z(0,  ?2  —  1)  +  ?’i  +  2ia. 
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Subcase  1.2.  There  is  no  number  equal  to  36  in  ( j ,  2  •  36).  We  then  know  that  j  must  have 
6+1  bits  and  be  at  least  36. 

Subsubcase  1.2.1.  Assume  that  j  +  +  i2  -  2  •  36  >  i2.  Removing  the  highest  bit  2  from 

2  •  36, . . . ,  2  •  36  +  i2  -  1, 

Z(2  •  36,2  •  36  +  ?2  -  1)  =  Z(0,i2-l)  +  2i2.  (11) 

Removing  the  highest  bit  2  from  2  •  36  +  i2,  •  •  • ,  j  +  h  +  *2  —  1, 

Z( 2  •  36  +  i2,j  +  *i  +  *2  -  1)  =  Z(i2,j  +  tj  +  i2  -  2  •  36  -  1)  +  2 (j  +  ix  -  2  •  36) 

>  Z(0,j  +  *i  -  2  •  36  -  1)  +  2(j  +  tx  -  2  •  36)  (Claim  1) 

=  Z(2-36,i  +  ?:x  -  1).  (12) 

Therefore, 

Z{j,j  +  *i  +  i2  —  1) 

=  Z(j,  2  •  36  -  1)  +  Z(2  •  36,2  •  3 6  +  i2  -  1)  +  Z{2  ■  Zh  +  i2lj  +  h  +  i2  -  1) 

>  Z(j,2-36-l)  +  Z(0,i2-l)  +  Z(2-36,i  +  i1-l)  +  2*2  (By  (11)  (12)) 

=  Z(j,  j  +  *1  -  1)  +  Z(0,  i2  -  1)  +  2?:2 

>  Z( 0,  ?i  -  1)  +  Z( 0,  i2  -  1)  +  i\  +  2i2  (Lemma  3.4) 

Subsubcase  1.2.2.  Assume  that  j  +  h  +  i2-  2  •  36  <  i2.  Removing  the  highest  bit  2  from 

2  •  3b, . . . ,  j  +  ?'i  +  i2  —  1, 

Z{ 2  ■  36,y  +  h  +  t2  -  1)  =  Z(0,j  +  *i  +  i2  -  2  ■  3b  -  1)  +  2(j  +  h  +  i2  -  2  •  36).  (13) 

By  Lemma  3.4, 

Z(j,j  +  *'i  —  1)  >  Z{ 0,  i\  -  1)  +  z'i.  (14) 

Removing  the  highest  bit  1  from  j  +  . . . ,  2  •  36  —  1, 

Z(j  +  h, 2 -3b-l)  =  Z(j  +  i1-S\3b-l)  +  (2-Sb~j-i1) 

>  Z{j  +  i\  +  i2  -  2  ■  36,  i2  -  1)  +  2(2  •  36  -  j  -  *i)  (Claim  2).(15) 

Adding  (13),  (14)  and  (15), 

Z{j,j  +  *i  +  *2  —  1)  >  Z( 0,  i\  —  1)  +  Z( 0,  i2  —  1)  +  i\  +  2?2. 

Case  2.  There  exists  36  in  (j,  j  +  ix  +  i2  -  1]  for  some  6,  and  j  +  «'i  +  i2  -  1  also  has  6  +  1 

bits  and  starts  with  1  in  its  base-3  representation. 

Subcase  2.1.  There  is  2  •  36-1  in  (j,  36). 

Subsubcase  2.1.1.  Assume  that  i2  <  36-1.  Removing  the  highest  bit  2  from  2-36-1, . .  .,2- 
36-1  +  i2  -  1, 

Z(2  ■  36-1, 2  •  36-1  +  i2  —  1)  =  Z(0,  i2  —  1)  +  2*2.  (16) 
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Removing  the  highest  bit  1  from  36, . . . ,  j  +  i\  +  ?'2  ~  1, 

Z(3b,j  +  h  +  *2  -  1)  =  Z(0,  j  +  h  +  *2  -  36  -  1)  +  {j  +  h  +  *2  -  36).  (17) 

By  Claim  2, 

Z(j,2  •  3b~l  —  1)  >  Z(j  +  *i  —  2  -  36-1,  *i  —  1)  +  (2  •  36-1  —  j)  (18) 

and 

Z(2  ■  3fc_1  +  hJ  ~  1)  >  Z(j+  ?'i  +  *2  -  3b,  j  +  n  -  2-36-1  -  1)  +  (3b_1  -  i2).  (19) 
Adding  (16),  (17),  (18)  and  (19), 

Z(j,j  +  ?i  +  ?2  -  1)  >  Z(0,n  -  1)  +  Z(0,i2  -  1)  +  2*i  +  2 i2. 

Subsubcase  2.1.2.  Assume  that  i2  >  36-1.  Then  j  >  i\  >  12  >  36-1. 

Z{j-Zb~\j  +  ii  +  i2-Zb-l)) 

>  z(0,h  -  36-1  -  1)  +  Z(0,i2  -  3t_1  -  1)  +  (*1  -  S^-1)  +  2(i2  -  3i>_1).(By  (5))  (20) 
Let  h  =  min{36  +  2  ■  36-1,  j  +  i\  +  *2}- 

Z{h-2-3b-\h  -  1) 

>  Z(0,36-1  -  1)  +  Z(0, 36-1  -  1)  +  36-1  +  2(36-1).  (By  (5))  (21) 

Subtracting  3i’~1  from  j, . . . ,  h  -  2  ■  36-1  -  1, 

Z(j,h-  2  •  36"1  -  1)  =  Z(j  -  Zb~1,h  -  36  -  1)  +  (/;  -  i  -  2  •  3*’-1).  (22) 

In  the  case  of  h  =  36  +  2  •  36-1 ,  removing  the  highest  bit  1  from  36  +  2  •  36-1 , . . . ,  j  +  i\  +  h  ~  1  > 
Z(h,j  +  i\  +  ?2  _  1) 

=  Z(2  •  3i-1,j  +  »i  +  ?2  —  3fc  —  1)  +  (j  +  i\  +  i2  —  36  —  2  •  3b_1).  (23) 

Adding  (22)  and  (23), 

Z{jji  -  2  ■  36-1  -  1)  +  Z(fe,j  +  i'i  +  ?2  -  1) 

=  Z(j  -  3 b~\j  +  h  +  ?2  -  36  -  1)  +  (*1  +  *2  -  2  •  36-1) 

>  Z(0,*'i  -  tf-1  -  1)  +  Z(0,t2  -  36-1  -  1)  +  (ii  -  3fc_1)  +  2(i2  -  36-1) 

+(*i  +  *2  -  2  ■  36-1)  (By  (20)) 

=  Z(36-1, *!  -  1)  +  Z(36-1,  i2  -  1)  +  (*1  -  36"1)  +  2(*2  -  36"1)-  (24) 

Adding  (21)  and  (24), 

Z(j,j  +  i\  +  i2  —  1)  >  Z(0, r'i  —  1)  +  Z(0,  i2  —  1)  +  *1  +  2*2- 
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Subcase  2.2.  There  is  no  number  equal  to  2  •  3fc_1  in  (j,  36).  We  then  know  that  j  must 
be  at  least  2  •  36-1. 

Subsubcase  2.2.1.  Assume  that  j  +  *i  +  *2  -  36  >  i\.  Removing  the  highest  bit  1  from 

36 . 3*  +  *r  -  1, 

Z(3b,3b  +  *i  —  1)  =  Z(0,ii-l)  +  *i.  (25) 

Removing  the  highest  bit  1  from  36  +  *i, . .  ,,j  +  i\  +  * 2  -  1, 

Z(  3b  +  i\,j  +  i\  +  *2  —  1)  =  Z(ii,j  +  i'i  +  *2  —  3fc  —  1)  +  (j  +  *2  —  36) 

>  Z(0,  j  +  i2  -  3b  -  1)  +  2(j  +  *2  -  36)  (Lemma  3.4).  (26) 

Removing  the  highest  bit  2  from  j, . . .,  3b  -  1, 

Z(j,  36-l)  =  Z(j  —  2  •  36_1,36_1  -  1)  +  2(36  —  j) 

>  Z(j  +  *2  -  36,  *2  -  1)  +  2(36  -  j)  (Claim  3).  (27) 

Adding  (25),  (26)  and  (27), 

Z(jJ  +  ii  +  *2  —  1)  >  Z(0,  *1  —  1)  +  Z(0,  *2  —  1)  +  *1  +  2*2- 

Subsubcase  2.2.2.  Assume  that  j  +  ix  4-  *2  -  2>b  <  i\.  Removing  the  highest  bit  1  from 
36, .  •  • ,  j  +  *i  +  *2  —  1, 

Z(3b,j  +  *1  +  *2  —  1)  =  Z(0  ,j  +  i\  +  *2  —  36  —  1)  +  (j  +  ?"i  +  %2  —  3ft).  (28) 

Removing  the  highest  bit  2  from  j,  j  +  *2  —  1, 

Z(j,  j  +  h-l)  =  Z(j  -  2  •  36"1 ,  j  +  *2  -  2  •  36"1  -  1)  +  2*2 

>  Z(0,*2  -  1)  +  2*2  (Claim  1).  (29) 


By  Claim  2, 

Z(j  +  ?2)  3fe  —  1)  >  Z(j  +  i\  +  *2  —  3^,  *1  —  1)  +  (3&  —  j  —  *2)-  (30) 

Adding  (28),  (29)  and  (30), 


Z(j  +  J  +  *!  +  *2  —  1)  ^  Z(0,  *1  —  1)  +  Z(0,  *2  —  1)  +  *1  +  2*2. 

Case  3.  There  is  no  number  equal  to  3fe  or  2  •  3^  in  (j,  j  +  i\  +  *2  —  1]  for  any  b.  We  then 
know  that  j, . . . ,  j  +  i\  +  *2  —  1  must  all  have  the  same  number  of  bits,  say  6+1,  and  the  same 
highest  bit  1  or  2.  Let  p\  •  •  -pt  be  the  longest  common  prefix  of  the  base-3  representations  of 

+  *1  +  *2  —  1.  Let  p  =  pi  •  3b  + - b Pt  ■  36_t+1.  Clearly,  p  <  j  and  p\  >  1.  Removing 

the  highest  t  bits  p\-  •  -pt  from  j, . . . ,  j  +  *1  +  *2  —  1, 


Z(j,  j  +  *1  +  *2  —  1)  >  Z(j  -p,j  +  *1  +  *2  -  p-  1)  +  (*i  +  *2). 
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Now  we  wish  to  show  that  Z(j  -  p,j  +  i i  +  i2  -  P  ~  1)  >  Z(0,  i\  -  1)  +  Z( 0,  i2  -  1)  +  ?2,  or 
equivalently,  Z(j  -  p,j  +  ?'i  +  »‘2  -  p  -  1)  —  Z( 0,  *i  —  1)  —  Z( 0,  i2  —  1)  >  h-  If  j  -  p  <  H, 


Z(j  -p,j  +  i l  +  i2-p-  1)-  Z(0,ti  -  1)-  Z(0,i2  -  1) 

—  Z(ii,j  +  i\  +  i2  —  p—l)  —  Z{0,j  —  p—l)  —  Z(0,i2  —  l) 

>  ma x{j  -  p,  i2}  +  2  min{j  -  p,  ?2}  (By  (5)). 

>  *2. 

If  j  -  P  >  ?i,  there  must  exist.  36,  or  2  •  3b'  in  ( j  —  p,j  +  i\  +  i2  -  p  —  1\  for  some  b'  <b  since 

the  highest  bit  in  j  —  p  is  not  the  same  as  the  highest  bit  in  j  +  +  i2  —  p  —  1.  By  Cases  1 

and  2, 

Z(j  -  p.j  +  ?i  +  i2-p  -  1)  -  Z(0,h  -  1)  -  Z(0,i2  -  1)  >  i  i  +  2  i2  >  i2.  I 
Lemma  4.1  W(0,4i  -  1)  =  4K7(0,  i  -  1)  +  4?  for  i  >  1. 


Proof  We  induct  on  i.  When  =  1,  W(0,3)  =  4W(0,0)  +  4  =  4.  Assume  that  the  equation 
holds  for  <  i  -  1.  Now  consider  i. 


T47(0,4?  -  1) 


=  W(0, 4f  -  5)  +  w(4i  -  4)  +  ic(4?  -  3)  +  w(4i  -  2)  +  w(4i  -  1) 

=  4W(0,  i  -  2)  +  (4?  -  4)  +  w(4i  -  4)  +  w(4i  -  3)  +  w(4i  -  2)  +  v:(4i  -  1) 

(Inductive  hypothesis) 

=  4W(0,  i  -  2)  +  (4?  -  4)  +  w(i  -  1)  +  w(4i  -  3)  +  w(4i  -  2)  +  w(4i  -  1) 
(Since  w(4i  -  4)  =  w(i  -  1)) 

=  4W(0,  i  -  2)  +  (4?  -  4)  +  w(i  -  1)  +  w(i  -  1)  +  1  +  w(4i  -  2) 

+  w(4i  -  1)  (Since  w(4i  —  3)  =  w(i  —  1)  +  1) 

=  4I4/(0,  i—  2)  +  (4i  -  4)  +  w(i  -  1)  +  w(i  -  1)  +  1  +  w(i  -  1)  +  1 

+w(4i  —  1)  (Since  w(4i  -  2)  =  w(i  -  1)  +  1) 

=  4H7(0,  i  —  2)  +  (4/  —  4)  +  w(i  —  1)  +  w(i  —  1)  +  1  +  w(i  —  1)  +  1 

+w(i  —  1)  +  2  (Since  u;(4?  —  1)  =  w(i  -  1)  +  2) 

=  4W(0,i  -  1)  +  4i.  ■ 


Lemma  4.2  W(0,4i)  =  14'(0,  i)  +  3IC(0,  i  -  1)  +  4 i  for  i  >  1. 


Proof  Straightforward  by  using  Lemma  4.1.  I 


Lemma  4.3  W(0, 4*  +  1)  =  2W(0.  i)  +  2H7(0,  i  -  1)  +  4i  +  1  for  i  >  1. 
Proof  Straightforward  by  using  Lemma  4.2.  I 
Lemma  4.4  I47(0,4?  +  2)  =  3W(0,i)  +  M7(0,  i  -  1)  +  4i  +  2  for  i  >  1. 
Proof  Straightforward  by  using  Lemma  4.3.  ■ 
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